Error-Correcting Output Codes for Multi-Label Text Categorization

نویسندگان

  • Giuliano Armano
  • Camelia Chira
  • Nima Hatami
چکیده

When a sample belongs to more than one label from a set of available classes, the classification problem (known as multi-label classification) turns to be more complicated. Text data, widely available nowadays in the world wide web, is an obvious instance example of such a task. This paper presents a new method for multi-label text categorization created by modifying the Error-Correcting Output Coding (ECOC) technique. Using a set of binary complimentary classifiers, ECOC has proven to be efficient for multi-class problems. The proposed method, called ML-ECOC, is a first attempt to extend the ECOC algorithm to handle multi-label tasks. Experimental results on the Reuters benchmarks (RCV1-v2) demonstrate the potential of the proposed method on multi-label text categorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-class Text Categorization with Error Correcting Codes

Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classi cation of Internet pages for a search engine database. The traditional 1-of-n output coding for classi cation scheme needs resources increasing linearly with the number of classes. A di erent solution uses an error correcting code, increasing in length with O(log2(n)) only. In t...

متن کامل

Multi-class Classification with Error Correcting Codes

Automatic text categorization has become a vital topic in many applications. Imagine for example the automatic classification of Internet pages for a search engine database. The traditional 1-of-n output coding for classification scheme needs resources increasing linearly with the number of classes. A different solution uses an error correcting code, increasing in length with O(log2(n)) only. I...

متن کامل

Learning with Limited Supervision by Input and Output Coding

In many real-world applications of supervised learning, only a limited number of labeled examples are available because the cost of obtaining high-quality examples is high or the prediction task is very specific. Even with a relatively large number of labeled examples, the learning problem may still suffer from limited supervision as the dimensionality of the input space or the complexity of th...

متن کامل

Multi-Label Output Codes using Canonical Correlation Analysis

Traditional error-correcting output codes (ECOCs) decompose a multi-class classification problem into many binary problems. Although it seems natural to use ECOCs for multi-label problems as well, doing so naively creates issues related to: the validity of the encoding, the efficiency of the decoding, the predictability of the generated codeword, and the exploitation of the label dependency. Us...

متن کامل

Multi-label classification using error correcting output codes

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012